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^3 ■ We consider two Riemannian geometries for the manifold A4(p,m x n) 

of all m x n matrices of rank p. The geometries are induced on 
Ai(p,m x n) by viewing it as the base manifold of the submersion 
■ 7r : (M, N) i — y MN T , selecting an adequate Riemannian metric on 

. the total space, and turning ir into a Riemannian submersion. The 

theory of Riemannian submersions, an important tool in Riemannian 
geometry makes it possible to obtain expressions for fundamental geo- 
metric objects on A4(p,m x n) and to formulate the Riemannian New- 
ton methods on A4(p,m x n) induced by these two geometries. The 
Riemannian Newton methods admit a stronger and more streamlined 
^ ■ convergence analysis than the Euclidean counterpart, and the compu- 

tational overhead due to the Riemannian geometric machinery is shown 
to be mild. Potential applications include low-rank matrix completion 



and other low-rank matrix approximation problems. 
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X " 1 Introduction 

Let m, n, and p < min{m, n} be positive integers and let M(p,m x n) denote the set of all 
rank-p matrices of size m x n, 

M(p, mxn) = {X 6 R mxn : rank(X) = p}. (1) 
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Given a smooth function / : M(p, m x n) — >■ R, we consider the problem 



min/(X) subject to X G A4(p, m x n). (2) 

Problem ([2]) subsumes low-rank matrix approximation problems, where f(X) = \\A — X\\ 2 
with A G R mxn given and || • || a (semi)norm. In particular, it includes low-rank matrix 
completion problems, which have been the topic of much attention recently; see [KMQIOl 
IDMKlll IBAlll IVanlll IMMBSlll IDKM12j and references therein. Interestingly, low-rank 
matrix completion problems combine two sparsity aspects: only a few elements of A are 
available, and the vector of singular values of X is restricted to have only a few nonzero 
elements. 

This paper belongs to a trend of research, see |HM94l IH5951 I5ETU1 IVanlll IMMSlll 
IMMBSllj . where problem © is tackled using differential-geometric techniques exploiting the 
fact that A4(p,m x n) is a submanifold of R mxn . We are interested in Riemannian Newton 
methods (see |Smi94|, |ADM + 02| IAMS08"] ) for problem ([2]), with a preference for the pure Rie- 



mannian setting |Smi94j . This setting involves defining a Riemannian metric on A4(p,m x n) 
and providing an expression for the Riemannian connection — which underlies the Riemannian 
Hessian — and for the Riemannian exponential. When Ai(p,m x n) is viewed as a Rieman- 
nian submanifold of R mxn , the necessary ingredients for computing the Riemannian Hessian 
are available [Van 111 §2.3], but a closed-form expression of the Riemannian exponential has 
been elusive in that geometry. 

In this paper, we follow a different approach that strongly relies on two-term factorizations 
of low-rank matrices. To this end, let 

R mx P — s^x £ R mxp : rank(X) = p} (3) 

denote the set of all full-rank m x p matrices, and observe that, since the function 

vr : R™ xp x R™ xp -)■ M(p, mxn): (M, N) ^ MN T (4) 

is surjective, problem ([2]) amounts to the optimization over its domain of the function / = forr, 
i.e., 

f : R™ xp x R" X P -> R : (M, N) ^ f(MN T ). (5) 

Pleasantly, whereas M(p,m x n) is a nonlinear space, R™ xp x R™ xp is an open subset of a 
linear space; more precisely, R* xp x R? xp is the linear space R mx P x R nxp with a nowhere 
dense set excerpted. The downside is that the minimizers of / are never isolated; indeed, for 
all (M, N) G R™ xp x R* xp , / = / o it assumes the same value f(M, N) at all points of 

■k^{MN t ) = {(MR, NR~ T ) : R G GL(p)}, (6) 

where 

GL(p) = {R G R pxp : det(R) + 0} 

denotes the general linear group of degree p. In the context of Newton-type methods, this 
can be a source of concern since, whereas the convergence theory of Newton's method to 
nondegenerate minimizers is well understood (sec, e.g., [DS83, Theorem 5.2.1]), the situation 
becomes more intricate in the presence of non- isolated minimizers (see, e.g., [GR85] ). 

The proposed remedy to this downside consists in elaborating a Riemannian Newton 
method that evolves conceptually on Ai(p,m x n) — avoiding the structural degeneracy in 
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K™ xp x R™ xp — while still being formulated in M™ xp x R* xp . This is made possible by endow- 
ing M™ xp x R™ xp and Ai(p, m x n) with Riemannian metrics that turn tt into a Riemannian 
submersion. The theory of Riemannian submersions |Q'N66l IO'N83j then provides a way of 
representing the Riemannian connection and the Riemannian exponential of M(p,m x n) in 
terms of the same objects of M™ xp x R* xp . 

It should be pointed out that the local quadratic convergence of the Riemannian Newton 
method is retained if the Riemannian connection is replaced by any affine connection and 
the Riemannian exponential is replaced by any first-order approximation, termed retraction; 
see |AMS08( §6.3]. The preference for the pure Riemannian setting is thus mainly motivated 
by the mathematical elegance of a method fully determined by the sole Riemannian metric. 

Some of the material of this paper is inspired from the PhD thesis Mcyllj and the 
talk [ADY09] . 

The paper is organized as follows. In the short sections [2] and [31 we show that ir is a 
submersion and we recall some fundamentals of Riemannian submersions. A first, natural 
but unsuccessful attempt at turning ir into a Riemannian submersion is presented in Section HJ 
Two ways of achieving success are then presented in sections [5] and [6l In Section the strategy 
consists of introducing a non-Euclidean Riemannian metric on lC Xp x K XP , whereas in 
Section [6l the plan of action is to restrict M™ xp x R™ xp by imposing orthonormality of one of 
the factors. We obtain closed-form expressions for the Riemannian connection (in both cases) 
and for the Riemannian exponential (in the latter case). Conclusions are drawn in Section [71 

2 M(j),mxn) as a quotient manifold 

The set A4(p,m x n) of rank-p matrices of size mxnis known to be an embedded submanifold 
of dimension p(m + n — p) of R mxn , connected whenever max{m, n} > 1; see [HM941 Ch. 5, 
Prop. 1.14]. Hence tt Q is a smooth surjective map between two manifolds. 

We show that tt is a submersion, i.e., that the differential of tt is everywhere surjective. 
Observe that the tangent space to lC Xp x M™ XP at (M, N) is given by 

T (M)N) R™ xp x R™ xp = R mxp x R nxp ; 

this comes from the fact that M™ xp x R™ xp is an open submanifold of the Euclidean space 
R mx P x r x P j AM sn81 §3.5.1]. For all (M, N) G RT* P x C Xp and all (M, N) G R mxp x R nxp , 
we have T)tt(M, N)[(M , N)] = MN T + MN T . Working in a coordinate system where 

T T 

M = [I 0] and N = [I 0] , one readily sees that the dimension of the range of the 
map (M, N) i— >■ Dir(M, N)[(M , N)] is equal to p(m + n — p), the dimension of the codomain 
of 7r. Hence ir is a submersion. 

As a consequence, by the submersion theorem |AMS08l Proposition 3.3.3], the fibers 
tt~ 1 (MN t ) are p 2 -dimensional submanifolds of R™ xp x M™ xp . Moreover, by jAMR88| Propo- 
sition 3.5.23], the equivalence relation ~ on ]R" lxp x R™ xp , defined by (M a ,N a ) ~ (M&,iV&) if 
and only if ir(M a ,N a ) = 7r(Mb,Nb), is reg ular and RT* P x R^ / ~ is a quotient manifold 
diffeomorphic to A4(p,m x n). 

3 Riemannian submersion: principles 

Turning it into a Riemannian submersion amounts to endowing its domain ]R™ xp x R™ xp with 
a Riemannian metric g that satisfies a certain invariance condition, described next. 
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By definition, the vertical space VtM,N) a * a point (M,N) G M™ xp x M™ xp is the tangent 
space to the fiber ir~ 1 (MN T ) We obtain 

V {M ,N) = {(MR, -NR T ) : R G MP xp }. (7) 

Let g be a Riemannian metric on ]R™ xp x R™ xp . Then one defines the horizontal space 
H(M,N) at (M,N) to be the orthogonal complement of V(m,n) i n M mxp x M nx P relative to 

7~L(m,n) = {(M,N) G R mxp x R" xp : g(M,N)((M, N), (MR, —NR T )) = 0, Vi? G M pxp }. (8) 

Next, given a tangent vector ^a4"tv t ^ T MA rr.A4(£>, to x n), there is one and only one 

X(M,N) ^^L{M,N) such that Btt(M, N)[X {MjN) ] = X mn t , (9) 

where D7r(X)[X] denotes the differential of ir at X applied to X. This Xr^m is termed 
the horizontal lift of X mn t at (M,N). (In order to lighten the notation, we use the same 
symbol for a tangent vector to M(p,m x n) and its horizontal lift; the distinction is clear 
from the subscript or from the context.) If (and only if), for all (M,N) G M.™ xp x M" xp , all 
Xmn t >X mn t G T MN iM(p,m x 77,), and all R G GL(p), it holds that 

9(M,N)(X(M,N), X (M,N)) = 9(MR,NR-' T )( X {MR,NR- T )^ X {MR,NR- rT ))i (1°) 
then there is a (unique) Riemannian metric g on M(p,m x n) consistently defined by 

5 a/ att ( X MN r > Xmnt ) = 9{m,n) ( x {m,n) > x (m,n) ) • 

The submersion vr : (M™ xp x M.* xp ,g) -> (A<(p, m x n),g) is then termed a Riemannian sub- 
mersion, and (.M(p, m x n),g) is termed a Riemannian quotient manifold of (M™ xp x IR* xp , <y). 
(We will sometimes omit the Riemannian metrics in the notation when they are clear from 
the context or undefined.) 

In summary, in order to turn ir into a Riemannian submersion, we "just" have to choose 
a Riemannian metric g of M™ xp x M™ xp that satisfies the invariance condition ([lUp . 

4 M(p,mxn) asa non- Riemannian quotient manifold 

In this section, we consider on EC Xp x C xp the Euclidean metric g, defined by 

9(M,N) ((M,N),(M,N)) := trace(M T M) + trace(iV T iV), (11) 

and we show that the invariance condition (|10p does not hold. Hence ir : (M™ xp x M^ xp ,g) — > 
A4(p,m x n) cannot be turned into a Riemannian submersion. 
The horizontal space (|8j) is 

H( M ,N) = {(Mi ■ trace(M T Mi?) + tmce(-N T NR T ) = 0, MR G W xp }. 



4 



Using the identities trace(^4) = trace(^4 T ) and trace(AB) = trace(-Bj4), we obtain the iden- 
tity trace(M T MR) + tiace(-N T NR T ) = trace ((R T (M T M - iV T iV)). It follows that the 
following propositions are equivalent: 

(M,A) eU {M ,N), 
M T M = A T A, 



3Lm, Ln, S : 



' M = M X L M + M(M' v M)~ l S 
N = N ± L N + N^N)' 1 ^, 



where M± denotes an orthonormal m x (m — p) matrix such that M T M± = 0, and likewise 
for N ± . 

Let X = MN T and let X mn t 6 T mn tM(p, m x n). We seek an expression for the 
horizontal lift X( MjN ^ = (Am(m,jv)) ^N(m,jv)) °f X MN r at (M, A), defined by Q. By a 
reasoning similar to the one detailed in Section 15.31 below, we obtain 

X U {M,N) = (Xmn?N - MK)(N T N)' 1 and X N{M>N) = (Xj /7VT M - NK T )(M T M)'\ 

where K solves the Sylvester equation 

M T MK + KN T N = M t X mn tN. 

One sees by inspection, or by a numerical check, that the invariance condition (|10p does 
not hold, and this concludes the argument. 



5 Ai(p,mxn)asa Riemannian quotient manifold of R" ixp x M* x 

In this section, we proceed as in Section U but now with a different Riemannian metric g, 
defined in f)12|) below. As we will see, the rationale laid out in Section [3] now leads to the 
conclusion that it : (M™ xp x M^ xp ,g) — > Ai(p,m x n), with g given by (|12p instead of (|lip . 
can be turned into a Riemannian submersion. This endows A4(p,m x n) with a Riemannian 
metric, g. We then work out formulas for the Riemannian gradient and Hessian of / on the 
Riemannian manifold (M(p,m x n),g), and we state the corresponding Newton method. 

5.1 Riemannian metric in total space 

Inspired from the case of the Grassmann manifold viewed as a Riemannian quotient manifold 
of M* xp |AMS08( Example 3.6.4], we consider the Riemannian metric g on IC XP x C XP 
defined by 

9(M,N) ((M, AO, (M, A 7 )) := trace (^{M T M)~ 1 M T M + (A t A)" 1 A> t A) . (12) 
We now proceed to show that it satisfies the invariance condition (|10p . 

5.2 Horizontal space 

The elements (M, A) of the horizontal space H(m n) (El) are readily found to be characterized 
by 

M T M(M T My 1 = (A rT A)" 1 A T A. (13) 
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In other words, 

U( M ,N) = {(M, N) £ R mxp x R nxp : N T NM T M = N T NM T M}. (14) 
5.3 Horizontal lift 

Let X = MN T and let X MN i belong to T MN rM.(p, m x n). We seek an expression for the 
horizontal lift XtM^m = (^m(m,n)^n(m,n)) defined in ([9]). In view of (fT3|) . we find that the 
horizontality condition (-X]yi(Af,iV) , ^N(M,iV) ) G H(M,N) is equivalent to 

^m(m,jv) = Af ± L M + M^M^if^M) (15a) 
*N(M,iV) = + iV(iV T Af)- 1 K T (iV T iV), (15b) 

where L M G R( m ~P)^P, L N € R(™-p) x p and K G M pxp . Since Dvr(M, iV)[X M(M)JV) , X N(M)JV )] = 
MXX MJV ^ + Xyi(M,N) -^ T ) the definition ([9]) implies that 

-^M7V T = MX N(M,N) + ^M(A/,AT)^ rT - (16) 

Replacing (fT5"j) in (fTSJ) yields 

Z mjv t = ML^Nl + Af(iV T iV)A'(Af T iV)" 1 Af T + Mj_L M iV T + M(M T M) _1 i^(M T M)A rT . 

(17) 

Multiplying CCD on the left by (M T M)~ 1 M T yields 

L^j = (M t M)- 1 M t X mn tN ± , (18a) 

multiplying (fT7|) on the right by NlN 1 - N)~ l yields 

L M = MjX AWT iV(iV T Af)- 1 , (18b) 

and multiplying (|17|) on the left by M T and on the right by N yields 

M t X mn tN = M T MN T NK + KM T MN T N. (18c) 

Replacing (fTBj) into (fT5|) yields 

^M(M,7V) = M ± MjX M7V xAf(A^ T iV)- 1 + M{M T M)~ 1 KM T M (19a) 
^n(m,tv) = iV ± iVjX^ JVT M(M T M)- 1 + Ni^N^K^N^N. (19b) 

We can further exploit the identities M ± Mj = I — M(M T M) _1 M T , and likewise for N, to 
rewrite (PT9l) as 



^M(M,JV) = (X mn tN - M N T N K)(N T N)^ 1 (20a) 
^N(M,JV) = P^mtvt^ - JVMWj^M)- 1 . (20b) 
This result is formalized as follows: 

Proposition 5.1 Consider the submersion tt and the horizontal distribution (|14p . Let 
(M,N) € M.™ xp x M" xp and let X mn t be in T mn tM(j>, m x n). Then the horizontal lift of 
X mn t at (M,N) is Xt M m = (^M(M,w)j^N(M,A r )) given by ([20]) . where K is the solution of 
the Sylvester equation (|18c|) . 
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5.4 Constitutive equation of horizontal lifts 

A horizontal lift Xr M>N \ fully specifies X mn t = Dtt(M, N)[X(m,n)] e T m ^tA4(p, m x n) as 
well as its horizontal lift at any other point of the fiber 7r _1 (M JV T ) ([6]). Let us obtain an 
expression for X^ mr ^ nr -t^ in terms of X( M ^ N y The expression ([20]) of horizontal lifts yields 
after routine manipulations 

We have obtained: 

Proposition 5.2 Consider the submersion ir Q and the horizontal distribution (|14[) . T/ien 
a vector ,/ie/d IEC Xp x M* xp 9 (M, JV) ^ X (M)N) G M mx P x M nx P is a horizontal lift if and 
only if ([HD holds for all (M, N) G RT xp x xp and a// R G GL(p). 

5.5 Riemannian submersion 

Routine manipulations using ()2ip yield that 5 f)12|) satisfies the invariance condition ()10p . 
Hence there is a (unique) Riemannian metric g on Ai(p,m x n) that makes 

vr : (]R™ xp x M™ xp ,5) (A*(p,m x n), 5 ) : (M, N) h-> MN T (22) 

a Riemannian submersion. The Riemannian metric g is consistently denned by 

9MN T ( X MN t j X mn t ) := cj(M,N) ( X (M,N) > X (M,N) )• (23) 

5.6 Horizontal projection 

We will need an expression for the projection P^ M N ^(M, N) of (M, TV) G W mx P x M nxp onto 
the horizontal space (|14p along the vertical space (j7|l . 

Since the projection is along the vertical space, we have 

P ( V )JV ) (M, JV) = (M + ME, N - NR T ) (24) 

for some R G M pxp . It remains to obtain R by imposing horizontality of (|24|). Since horizontal 
vectors are characterized by (|13|) . we find that (|24p is horizontal if and only if 

M T (M + MR)(M T M)~ 1 = (N T N)- 1 (N T - RN T )N, 

that is, 

M T MR(M T M)- 1 + (JV T iV)~ 1 i?JV T iV = -M T M(M T M) _1 + (N T Ny 1 N T N, 
which can be rewritten as the Sylvester equation 

N T NM T MR + RN T NM T M = -N T NM T M + N T NM T M. (25) 

In summary: 

Proposition 5.3 The projection P^ MN) (M,N) of (M, JV) G R mxp x M nxp onto i/ie /iori- 

zontal space (|14p along the vertical space ([7|) is given by (|24p where R is the solution of the 
Sylvester equation (f25|) . 
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5.7 Riemannian connection on the total space 

Since the chosen Riemannian metric g (|12j) on the total space R™ xp x M™ xp is not the Eu- 
clidean metric (|lip . it can be expected that the Riemannian connection on (R™' xp x M* xp ,g) 
is not the plain differential. We show that this is indeed the case and we provide a formula for 
the Riemannian connection V on (R* nxp x R* xp ,g). The motivation for obtaining this for- 
mula is that the Riemannian Newton equation on (A4(p, m x n),g) requires the Riemannian 
connection on (A4(p, m x n),g), which is readily obtained from V as we will see in Section [5]8j 
The general theory of Riemannian connections (also called Levi-Civita connections) can be 
found in [AMS081 §5.3] or in any Riemannian geometry textbook such as |dC92] , 
The development relies on Koszul's formula 

2#(V X ? ? , = d x g( V , + d v g( X , £) - %(x, rj) + v], ~ 9([x, €],v) ~ 9(h (26) 

After lengthy but routine calculations, we obtain the following expression for the Rieman- 
nian connection V on (M™ xp x M* xp ,g): 

(v^y) M = d£Y u - Y M (M T M)- 1 sym(X^M) - X M (M T M)- 1 sym{Y^M) 

+ M(M T M)- 1 sym(xSy M ) (27a) 

and 

(V^5>) N = d x Y N - Y^^NyhymiX^N) - X N (N T N^ym^ N) 

+ N(N T N)- 1 sym(X$Y N ), (27b) 

for all (M, N) £ lC xp x R™ xp , all X £ T (MiAr) IEC Xp x M™ xp and all tangent vector fields Y 
on RT Xp x C XP 

5.8 Connection on the quotient space 

Let V denote the Riemannian connection on the quotient space A4(p,m x n) endowed with 
the Riemannian metric g (|23p . A classical result in the theory of Riemannian submersions 
(see |Q'N66l Lemma 1] or [AMS081 §5.3.4]) states that 

for all X mn t £ T mn tA4(p, m x n) and all tangent tangent vector fields Y on M(p, m x n). 
That is, the horizontal lift of the Riemannian connection of the quotient space is given by 
the horizontal projection (|24|) of the Riemannian connection (|27p of the total space. (The 
tangent vector field Y on the right-hand side denotes the horizontal lift of the tangent vector 
field Y of the left-hand side.) 

5.9 Riemannian Newton equation 

For a real- valued function / on a Riemannian manifold M. with Riemannian metric g, we let 
grad/(x) denote the gradient of / at x £ A4 — defined as the unique tangent vector to A4 at 
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x that satisfies g x (gradf(x),£ x ) = Df(x)[£ x ] for all £ x 6 T x -M. — and the plain Riemannian 
Newton equation is given by 

V^grad/ = -grad /(a;) 

for the unknown rj x G T X A4, where V stands for the Riemannian connection; see, e.g., [AMS08. 
§6.2]. 

We now turn to the manifold A4(p,m x n) endowed with the Riemannian metric g (|23|) 
and we obtain an expression of the Riemannian Newton equation by means of its horizontal lift 
through the Riemannian submersion 7r (|22|) . First, on the total space M™ xp x M* xp endowed 
with the Riemannian metric g (|12p , we readily obtain the following expression for the gradient 
of/©: 

grad /(M, AO = (d M f(M,N)M T M,d N f(M,N)N T N), 

where dyif(M,N) denotes the Euclidean (i.e., classical) gradient of / with respect to its first 
argument, i.e., (0m/ (M, N))ij = gj/(M + te { ej , N)\ t=0 , and likewise for d^f(M,N) with 
the second argument. Then the horizontal lift of the Newton equation at a point (M, iV) of 
the total space W™ xp x K* xp , for the unknown Xr M m in the horizontal space 1~L(m.n) GU)> 
is 

P(M,N)(Vx (M , N) g™df) = -grad f(M,N), (28) 

where P h is the horizontal projection given in Section [Rol and V is the Riemannian connection 
on (lC Xp x R* xp ,g) given in SectionO To obtain fl28|), we have used the fact (see [AMS081 
(3.39)]) that grad/(M, N) = grad/(M, N), where the left-hand side denotes the horizontal 
lift of grad f{MN T ) at (M, N). 

Intimidating as it may be in view of the expressions of P h and V, the Newton equation (|28D 
is nevertheless merely a linear system of equations. Indeed, -X'^jv) ^ P[m n) (^-^(m jv) § ra d /) 
is a linear transformation of the horizontal space H(m,n)- Thus ()28p can be solved us- 
ing "matrix- free" linear solvers such as GMRES. Moreover, in addition to computing the 
Euclidean gradient of / and the Euclidean derivative of the Euclidean gradient of / along 
X(M,N), computing (Vx (MiJV) grad /) requires only 0(p 2 (m + n + p)) flops. 

5.10 Newton's method 

In order to spell out on (A4(p, m x n),g) the Riemannian Newton method as defined in [AMS081 
§6.2], the last missing ingredient is a retraction R that turns the Newton vector X mn t into 
an updated iterate R mn tX mn t in Ai(p,m x n). The general definition of a retraction can 
be found in [AMS081 §4.1]. 

The quintessential retraction on a Riemannian manifold is the Riemannian exponential; 
see [AMS08, §5.4]. However, computing the Riemannian exponential amounts to solving the 
differential equation = 0, which may not admit a closed- form solution. In the case 

of (Ai(p,m x n),g), we are not aware of such a closed- form solution, and this makes the 
exponential retraction impractical. 

Fortunately, other retractions are readily available. A retraction on M(p, m x n) is given 

by 

R MN T ( X MN T ) '■= ( M + X M ( M:N ))(N + X N (M,N)) > ( 29 ) 

where -Xm(M,at) and -XWat j\n are horizontal lifts as defined in Proposition 15.11 It is readily 
checked that the definition is consistent, i.e., it depends on MN^ and not on the specific 
choices of (M,N) in the fiber ([6]). 
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With all these elements in place, we can describe Newton's method as follows. 

Theorem 5.4 (Riemannian Newton on Ai(p,m x n) with Riemannian metric ()23[) ) 

Let f be a real-valued function on the Riemannian manifold M(p,mxn) (pQ), endowed with 
the Riemannian metric g (|23p . wi/i i/ie associated Riemannian connection, and with the re- 
traction (|29p . T/ien the Riemannian Newton method for f maps MN T E m x n) to 
(M + Xm){N + Xn) t , where (Xm,^n) is the solution Xf M m of the Newton equation (J2SJ) - 

Note that, in practice, it is not necessary to form MN T . Given an initial point M q Nq, 
one can instead generate a sequence {(M&, Nk)} in R™ xp x R™ xp by applying the iteration 
map (M,N) H> (M + X M , N + X N ). The Newton sequence on M(p,m x n) is then {M k N^}, 
and it depends on M Nq but not on the particular Mq and No. 

The following convergence result follows directly from the general convergence analy- 
sis of the Riemannian Newton method |AMS08l Theorem 6.3.2]. A critical point of / : 
M(p, m x n) — > E is a point X* where grad f(X*) = 0. It is termed nondegenerate if the Hes- 
sian Tx„M(p, m x n) 3 X i— >■ V^grad/ E Tx t M(p, m x n) is invertible. These definitions 
do not depend on the Riemannian metric nor on the affine connection V. 

Theorem 5.5 (quadratic convergence) Let X* be a nondegenerate critical point of f . 
Then there exists a neighborhood U of X* in A4(p,m x n) such that, for all initial iterate 
Xq E tl, the iteration described in Theorem \5.J\ generates an infinite sequence {X^} converging 
superlinearly (at least quadratically) to X*. 

6 M(p,raxn) asa Riemannian quotient manifold with an or- 
thonormal factor 

We now follow the second plan of action mentioned at the end of Section [TJ Bear in mind that 
the meaning of much of the notation introduced above will be superseded by new definitions 
below. 

6.1 A smaller total space 

Let 

St(p, m) = {M E R mxp : M T M = I p }, (30) 

denote the Stiefel manifold of orthonormal m x p matrices. For all X S A1(p,m x n), there 
exists (M,N) with M orthonormal such that X = MN T . To see this, take (M, N) E 
R™ xp x R'" xp such that X = MN T , let M = QR be a QR decomposition of M, where R is 
invertible since M has full rank, and observe that X = MR~ 1 (NR T ) T = Q(NR T ) T . Hence 

vr : St(p, m) x M" xp -> M(p, m x n) : (M, N) h- > MN T (31) 

is a smooth surjective map between two manifolds. 

As in Section [21 but now with the restricted total space St(p,m) x M™ xp , we show that 
7r ()3ip is a submersion. The tangent space at M to St(p, m) is given by (see [AMS081 Exam- 
ple 3.5.2]) 

T A /St(p, m) = {M E M mxp : M T M + M T M = 0} 

= {MQ + M ± W : n = -n T E M pxp , W E R( m - p ) xp }, 
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and we have 

T {M>N) St(p,m) x R^p = (T M St(p,m)) x R nxp . 

For all (M,N) G St(p,m) x M™ xp and all (M,iV) G T {Mj7V) St(p, m) x M™ xp , we have that 
D7r(M, iV)[(M, iV)] = MN T + MiV T . Here again, we can work in a coordinate system 



where M = [i 0] T and N = [I 



T (M) iV)Stb,m)xRr p } = { 



VK 



T 



. We have that {Dtt(M, N)[(M, N)] : (M, N) G 

: 17 = -0 T G R pxp ,iV 1 G R pxp ,N 2 G R( n -P) x P, VF G 

]^( m -p)xp| ) a linear subspace of dimension p 2 + (n — p)p + (m — p)j> = p(m + n — p), which is 
the dimension of M(p,m x n). Hence tt (|31|) is a submersion. 
The fiber of vr (ED at MN T is now 



7T _1 (MiV T ) = {(MR, NR) : R G 0(p)}, (32) 

where 

0(p) = {-R G M pxp : i? T i? = Ip} 

denotes the orthogonal group of degree p. 

The vertical space V/m,n) at a point (M,N) G p xR™ p , i.e., the tangent space to 
the fiber 7r _1 (MiV' r ) at (M,N), is given by 

V (M)Ar) = {(Mfi, M2) : = -n T G K pxp }. (33) 

6.2 Riemannian metric in total space 

We consider St(p, m) x R™ xp as a Riemannian submanifold of the Euclidean space R mxp x M Tlxp . 
This endows St(p, m) x R* xp with the Riemannian metric g defined by 

g { M,N)((M,N),(M,N)^j := trace (m T M + A T iV) (34) 

for all (M,N) and (M,N) in T (MjJV )St(p, m) x ]R™ xp . 

Adapting the rationale of Section [5l we will obtain in Section T6.6I below that, with this g, 
it (|3ip can be turned into a Riemannian submersion. 

6.3 Horizontal space 

The horizontal space %im,N) ls the orthogonal complement to Vim,N) fl55]) in T^jyjSt^, m ) x R* Xp 
with respect to g (fM|h The following propositions are equivalent: 



(M,N)eH iM>N) , 
Me T M St(p,n), A G R nxp , tr(fF (M T M + A T A)) = 0, VO = -0 T , 

M T M = -(M T M) T , M T M + A T A = (M T M + A T A) T , (35) 
M = MQ + M ± VF, TV = A(A T A)" 1 (-0 + 5) + iV ± L, 

with W G ]R( m -P) x P, ft = -n T G M pxp , 5 = 5 T G IF xp , L G R( n -p)*P. In summary, 

K( M) jv) = {(M, A) : M T M = f.U T .\7)' . M T M + A T A = (M T M + .V 1 .V i 1 } . (36) 
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6.4 Horizontal lift 

Proceeding as in Section 15.31 but now with the horizontal space (|36p and taking into account 
that M T M = I, we obtain that the horizontal lift of X mn t £ T MN TSt(p, m) x M" xp is given 
by 

X M (M,N) = Mn + M ± Ml XM^Ni^Ny 1 (37a) 
X N(M ,N) = N(N T N)-\S - n) + N ± Nlx] mT M (37b) 
where tt(N T N + I) + S = M t X mn tN, Q = -ft T , S = S T . (37c) 

Equation (|37c|) is equivalent to 

n(N T N + I) + (iV T iV + I)n = M t X mn tN - N T Xl INT M : (38a) 
5 = MX mn tN -tt(N T N + /). (38b) 

As for the first two equations of ([37]) . using (|37cp . they can be rewritten as 

X M (M,N) = X.j^NiN^N)- 1 - M(fi + 5)(iV T iV)- 1 (39a) 
X N(M ,N) = X T MNT M + Ml (39b) 

In summary, 

Proposition 6.1 Consider the submersion ir (|3ip and i/ie horizontal distribution (j36[) . Lei 
(M,N) S St(p, m) x IR" xp and Zei X MA tt 6 T M jyT.M(p, m x n). Taen t/ie horizontal lift of 
X mn t at (M,N) is Xf MjN \ = (X M r M>N \, X^(m,n)) given by (|3"9"j) . where is i/ie solution of 
the Sylvester equation (|38ap and 5 is given by (|38b|) . 

6.5 Constitutive equation of horizontal lifts 

From Proposition 16.11 routine manipulations lead to the following constitutive equation for 
horizontal lifts: 

Xm(MR,NR) = -^M(M,JV)-R; Xisi(MR,NR) = ^N(M,iV)-R- (40) 

Hence we have the following counterpart of Proposition 15.21 

Proposition 6.2 Consider the submersion n (|31|) and the horizontal distribution (|36|) . Taen 
a tangent vector field St(p,m) x M™ xp 3 (M,N) i-> X(m s n) £ T( MA r)St(p, m) x M™ xp is a 
horizontal lift if and only if (|40p ao/ds /or a// (M, iV) € St(p, m) x R™ xp and a// i? £ O(p). 

6.6 Riemannian submersion 

From Proposition 16.21 and the properties of the trace, it is direct that g (|34p satisfies the 
invariance condition 

9(M,N) (X(M,N) ) -X"(M,JV) ) = 9(MR,NR) (^(ArR,iV_R) j X( MR)NR } ) • 

(41) 

Hence one consistently defines a Riemannian metric g on A4(p,m x n) by 

5A/ ATT (X mn t , X mn t ) = (}(M,N) {X(M,N) ) X(M,N) )> (42) 
and vr : (St(p,m) x M™ xp ,o) ->■ {M{p,m x re),o) is a Riemannian submersion. 
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6.7 Horizontal projection 

We now obtain an expression for the projection P^ M N ^ (M, N) of (M, N) € T^ MtN ^St(p, m) x 
onto the horizontal space (|36p along the vertical space (|33p . Since the projection is along the 
vertical space, we have 

P (M,N) (M, N) = (M + MQ, N + Nil) (43) 

for some Q = — T £ M pxp . It remains to obtain Q by imposing horizontality of (|43|) . The 
characterization of horizontal vectors given in (|35p yields the Sylvester equation 

(N T N + J}n + ft(A T A + I) = M T M - M T M + iV T iV - iV T iV. (44) 

In summary: 

Proposition 6.3 The projection PV M N JM, N) of (M,N) € T (MjiV )St(p,m) xlf onto 
£/ie horizontal space (|36p along the vertical space (|33|) is given by (|43p where £1 is the solution 
of the Sylvester equation (|44p . 

6.8 Riemannian connection on the total space 

Let P|J denote the orthogonal projection from ]R mx P onto TjvfSt(p, n), given by (sec [AMS08, 
Example 5.3.2]) 

P$M = (I - MM T )M + Mskew(M T M) = M — Msym(M T M), (45) 

where skew(Z) := \{Z - Z T ) and sym(Z) := \{Z + Z T ). We also let P ( S ^ } denote the 
orthogonal projection from R mx P x R nxp onto Tr M mSt(p,m) x ]R™ xp , given by 

P^ ) (M,N) = (Pi$M 1 N). (46) 

Since St(p, m) x R™ xp , endowed with the Riemannian metric g (|34p . is a Riemannian 
submanifold of the Euclidean space M mxp x IR nxp , a classical result of Riemannian geometry 
(sec [AMS08, §5.3.3]) yields that the Riemannian connection V on (St(p, m) x M^ xp ,g) is 
given by 

V X 1 — (M,N) X ' 

that is, 

(v^y) M = pst (9j .y M) (47a) 

(V) N = ^N (47b) 

for all (M,N) € St(p,m) x M" xp , all X G T (Mi7V) St(p, m) x M™ xp and all vector fields Y on 
St(p,m) xMf. 
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6.9 Connection on the quotient space 

As in Section 15. 8( we can now provide an expression for the Riemannian connection V on the 
manifold A4(p,m x n) endowed with the Riemannian metric g (|42|) : 

( V X mnT Y )(M,N) = P (M,N) 

_ ph pStxK o t> 
_ r (M,Ny(M,N) U X x ' 

with P h as in (03|) and P StxR as in (06]). (Observe that Y of the right-hand side is the 
horizontal lift of Y of the left-hand side.) 



6.10 Riemannian Newton equation 

Given /:M(p,mxii)->l, define / = / ° vr, i.e., 

/ : St(p, m) x M" xp -> M : (M, N) ^ f(MN T ), 

and define 

f : E™ x?3 x E" xp — > E : (M, iV) h-> f{MN T ). 
Let grad/ denote the Euclidean gradient of /. We have (see |AMS081 (3.37)]) 

grad /(M, iV) = Pf^grad /(M, iV) (48) 

and (see [AMS081 (3.39)]) 

grad f(M, N) = grad f(M, N), 

where the left-hand side stands for the horizontal lift at (M,N) of grad f(MN T ). 

We can now obtain the counterpart of the (lifted) Newton equation (|28p with normalization 
on the M factor: 

P(M,N)(V X(MiN) g™df) = —grad f(M, N), (49) 

where P h is the horizontal projection given in Section 16.71 V is the Riemannian connection 
on ^E^ xp x E* xp , g) given in Section \6. 8\ and grad / is obtained from the Euclidean gradient 
of / from (US]). 

The Newton equation (|49p can be considered less intricate than in the non-orthonormal 
case (|28|) because the expression for V in (|47|) is simpler than in (|27p . In any case, the 
discussion that follows (|28p applies equally: the Newton equation is merely a linear system 
of equations, and the Riemannian overhead requires only 0{p 2 (m + n + p)) flops. 



6.11 Newton's method 



Another reward that comes with the orthonormalization of the M factor is that the Rieman- 
nian exponential with respect to g (|42p admits a closed-form expression. First, we point 
out that, in view of [EAS98, §2.2.2], the Riemannian exponential on St(p, m) x E" xp for the 
Riemannian metric g (|34p is given by 



Exp {M:N) (M,N) = {[M M]exp j ^ J 2piP exp(-^), N + N), 



(50) 
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where A := M T M and S := M T M, and where exp stands for the matrix exponential (expm in 
Matlab). Second, since by [Q'N831 Corollary 7.46] horizontal geodesies in (St(p, m) x M" xp , g) 
map to geodesies in (A4(p,m x n),g), we have that 

Ex PMN T ( X MN T ) = 7r ( Ex P(Af,Ar)(^M(M,Ar),^N(M,Af)))> (51) 

with (Xy[t m,n) > X-'N(m,n) ) as i n Proposition 16. II (In (fSTj) . Exp on the right-hand side is given 
by (|50p and Exp on the left-hand side denotes the Riemannian exponential of (A4(p, m x n),g).) 

Observe that the matrix exponential is applied in (|50p to matrices of size 2p x 2p and 
p x p; hence, when p <C m, the cost of computing the M component of (|50p is comparable to 
the cost of computing the simple sum M + M. Note also that, in practice, the M component 
of the Newton iterates may gradually depart from orthonormality due to the accumulation of 
numerical errors; a remedy is to restore orthonormality by taking the Q factor of the unique 
QR decomposition where the diagonal of the R factor is positive. 

We can now formally describe Newton's method in the context of this Section [6) 

Theorem 6.4 (Riemannian Newton on A4(p,m x n) with Riemannian metric (|42|)) 

Let f be a real-valued function on the Riemannian manifold A4(p,m x n) ([TJ, endowed with 
the Riemannian metric g (|42p . with the associated Riemannian connection, and with the 
exponential retraction (|5ip. Then the Riemannian Newton method for f maps 
A4(p, m x n) to 7t(ExP( MiA t)(Xm, Xn)), where tt is given in (f3Tj) . Exp is defined in ([50]) . 
and (Xm,Xn) is the solution X( M N ^ of the Newton equation ([4"5]) . 

The quadratic convergence result in Theorem 15.51 still holds, replacing the reference to 
Theorem 15.41 bv a reference to Theorem 16.41 

7 Conclusion 

We have reached the end of a technical hike that led us to give in Theorem 16.41 what is, to 
the best of our knowledge, the first closed-form description of a purely Riemannian Newton 
method on the set of all matrices of fixed dimension and rank. By "closed-form" , we mean 
that, besides calling an oracle for Euclidean first and second derivatives, the method only needs 
to perform elementary matrix operations, solve linear systems of equations, and compute 
(small-size) matrix exponentials. By "purely Riemannian", we mean that it uses the tools 
provided by Riemannian geometry, namely, the Riemannian connection (instead of any other 
affine connection) and the Riemannian exponential (instead of any other retraction). 

The developments strongly rely on the theory of Riemannian submersions and are based 
on factorizations of low rank matrices X as MN T , where one of the factors is orthonormal. 
Relaxing the orthonormality constraint is more appealing for its symmetry (the two factors are 
treated alike), but it did not allow us to obtain a closed- form expression for the Riemannian 
exponential. 
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